A few of the referenced functions (e.g., AnovaRM, Markdown calling Python variables, and anything involving live MATLAB or R code) will not run on the Binder instance of Jupyter. However, most everything else should.
Also use this to get all the pythons:
# install everything with Python 2 and 3.
conda create -n py36 python=3.6 anaconda
conda create -n py27 python=2.7 anaconda
# register py27 kernel - no need for "source" on windows
source activate py27
ipython kernel install
# same for py36, and install juptyerhub in the py36 env
source activate py36
ipython kernel install
pip install jupyterhub
In the event that for some reason your Jupyter instance of Python isn't seeing your installed packages, this means that it's probably pointing to the wrong Python or the wrong path. First, diagnose the problem in Jupyter by running !which python (on Mac/Linux) and import sys; sys.path. The first command checks where it's looking for Python itself (using the terminal), while the second says where it's looking for packages. These answers should look the same from your own terminal as well -- if the answers differ between Jupyter and your terminal, then you've found your problem.
You should be able to fix either problem by activating your chosen environment, and running python -m ipykernel install --user
pip install insert_package_name_heresudo if you're on a Mac.conda install insert_package_name_here if you run into issues with pipconda install -c conda-forge insert_package_name_here is also an option for certain packages.jupyter contrib nbextension install --userjt -t grade3 -fs 12 -tfs 12 -nfs 115 -cellw 88% -Tjt -rpython -m nbopen.install_xdgpython -m nbopen.install_win./osx-install.shpython setup.py install.yml file from Gary Lupyan's githubpsychopy.ymlconda-env create -f psychopy.yml -n psychopysource activate psychopy (no need for source on Windows)jupyter lab or jupyter notebookhttp://localhost:8888/lab or http://localhost:8888/tree respectively.jupyter nbconvert --to html_toc FILENAME.ipynbc = get_config()
c.Exporter.preprocessors = ['pre_pymarkdown.PyMarkdownPreprocessor']
Double-click on the cells to see how everything was written!
Headings are made with preceding "#" signs. <h1> is #, <h2> is ##, etc.
Force new blank lines with <br> .
Italics are made by surrounding a word or phrase with asterisks, or with underscores, like so.
Bold words are made by surrounding a word or phrase with 2 asterisks on each end.
You can make a phrase both bold and italic by combining the above!
Put a ">" before a line to turn it into a blockquote.
Unhighlighted code goes between backticks: this is code
And you can define blocks of code by sandwiching them between 3 backticks on either end (you can even define syntax highlighting!)
x = [1, 2, 3]
for i in x:
print(i)
Hyperlinks go in square brackets, with the link itself going in parentheses immediately after (no whitespace allowed between neighboring brackets)!
Images are set up just like hyperlinks, but with an exclamation point in front. The writing in square brackets serves as the alt-text for the image.

%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/HW29067qVWk" frameborder="0" allowfullscreen></iframe>
# %%HTML
# <iframe src="https://fiddle.jshell.net/rahonavis75/ed4486f9/show/" width="800" height="500">
Sandwich your LaTeX between two dollar signs.
$$
\begin{equation*}
\left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2 \right) \left( \sum_{k=1}^n b_k^2 \right)
\end{equation*}
$$
If you wanted you could literally write your paper in Jupyter notebook! To do this, you would collapse your analysis script with your manuscript by feeding the results of the fomer directly into the latter. Here's an example where I feed a variable into a Markdown cell.
foo = 100
foo is {{foo}}
See all commands.
lsmagic
See list of current variables in global scope. Can also specify a data type thereafter.
%who
And run terminal commands directly with "!"
!pip list
SHIFT+TAB will bring up help for your current functionCTRL+Enter executes the current cell, keeping your focus on itCTRL+SHIFT+Enter executes the current cell, and moves you down to the next cellALT+Enter executes the current cell AND makes a new one belowESC brings you to command mode, where you can do a number of things:A makes a new cell aboveB makes a new cell belowD D (that's D twice) deletes a cellX cuts selected cellsC copies the cellsV pastes the cellsY turns the cell into codeM turns the cell into MarkdownCTRL+SHIFT+F brings up the command palette, with all available commandsGiant pandas tutorial and attendant notes available at the links.
Allow plots in the notebook itself, and enable some helpful functions.
%reset -f
%matplotlib inline
%config InlineBackend.figure_format = 'retina' # High-res graphs (rendered irrelevant by svg option below)
%config InlineBackend.print_figure_kwargs = {'bbox_inches':'tight'} # No extra white space
%config InlineBackend.figure_format = 'svg' # 'png' is default
import warnings
warnings.filterwarnings('ignore') # Because we are adults
Import example data.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
data = sns.load_dataset('tips')
data.head() # show first n entries (default is 5)
Change default graph appearance to something you like. See here for full list of available built-in styles.
sns.set_style("ticks") # e.g., ggplot, whitegrid, etc.
## Define custom color palette
# flatui = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# sns.set_palette(flatui)
# sns.palplot(sns.color_palette())
Plot histograms of tips grouped by sex side by side. Make sure both have the same x and y limits.
data['tip'].hist(by=data['sex'], sharex=True, sharey=True)
sns.despine() # Remove top and right side of box
plt.show() # Somewhat redundant in this context, but suppresses annoying text output.
Plot overlaid histograms.
grouped_by_sex = data.groupby('sex')
# You can also add several arguments below like bins=20, or normed=True
figure, axes = grouped_by_sex['tip'].plot(kind='hist', normed=False, alpha=.5, legend=True)
# Re-label legend entries, move legend to right-middle
axes.legend(['Men', 'Women'], loc=(0.75, 0.5))
sns.despine()
plt.show()
Show summary stats for the sexes.
grouped_by_sex['tip'].describe()
Get a subset of the data — here the tips given on Sunday at dinner time.
sunday_dinner_tips = data.tip[(data.day=="Sun") & (data.time=="Dinner")]
Add a new column showing the percentage of the total bill tipped using a lambda expression. Naturally, you can also accomplish this by defining a named function.
data['tip_percentage'] = data.apply(lambda row: row['tip']/row['total_bill']*100, axis=1)
data.head()
Delete that new column.
del data['tip_percentage']
data.head()
Perform an ANOVA, using R-style syntax.
import statsmodels.api as sm
from statsmodels.formula.api import ols
model = 'tip ~ sex * smoker'
lm = ols(model, data=data).fit()
table = sm.stats.anova_lm(lm, typ=2)
display(table)
Make the table prettier and more intelligible.
from prettypandas import PrettyPandas
def color_significant_green(val, alpha=0.05):
if val < alpha: color = 'green'
else: color = 'black'
return 'color: %s' % color
def bold_significant(val, alpha=0.05):
if val < alpha: font_weight = 'bold'
else: font_weight = 'normal'
return 'font-weight: %s' % font_weight
t = PrettyPandas(table)
(
t.applymap(color_significant_green, alpha=.05, subset=['PR(>F)']) # alpha is optional here, of course
.applymap(bold_significant, alpha=.05, subset=['PR(>F)'])
.format("{:.3f}", subset=['sum_sq', 'F', 'PR(>F)']) # show only 3 decimal places
)
from numpy import sqrt
from scipy.stats import ttest_ind
def cohens_d(t, n):
return 2*t / sqrt(n - 2)
# Set up empty results table
columns = ['n', 't', 'p', 'd']
index = []
results = pd.DataFrame(index=index, columns=columns)
# Get data for t-test
male_tips = data[data['sex']=='Male']['tip']
female_tips = data[data['sex']=='Female']['tip']
# Perform t-test and surrounding calculations
n = male_tips.count() + female_tips.count()
df = n-2
t, p = ttest_ind(male_tips, female_tips)
d = cohens_d(t, n)
# Add data to table
comparison = 'Male vs. Female'
results.loc[comparison] = [n, t, p, d]
# Output pretty table
r = PrettyPandas(results)
(
r.applymap(color_significant_green, subset=['p'])
.applymap(bold_significant, subset=['p'])
.format("{:.3f}", subset=['t', 'p', 'd'])
)
from IPython.display import Markdown
inequality_symbol = "="
def report_t_test(df, t, p, d, alpha=.001):
if p < alpha:
p = .001
inequality_symbol = "<"
else:
inequality_symbol = "="
T = format(t, '.2f').lstrip('0') # 2 decimal places, no leading 0
P = format(p, '.3f').lstrip('0')
D = format(d, '.3f').lstrip('0')
DF = format(df, 'd') # integer
output = ('*t*({0})={1}, *p*' + inequality_symbol + '{2}, *d*={3}').format(DF, T, P, D)
display(Markdown(output))
report_t_test(df, t, p, d)
And in plain markdown: t({{n-2}})={{format(t, '.2f').lstrip('0')}}, p{{inequality_symbol}}{{format(p, '.3f').lstrip('0')}}, d={{format(d, '.3f').lstrip('0')}}
Requires development version of statsmodels package, available here.
pip install git+insert_link_hereimport pandas as pd
import numpy as np
import statsmodels
from statsmodels.stats.anova import AnovaRM
statsmodels.__version__
Create simulated reaction time data for 2 levels of an independent variable.
N = 20
P = [1,2]
values = [998,511]
sub_id = [i+1 for i in range(N)]*len(P)
mus = np.concatenate([np.repeat(value, N) for value in values]).tolist()
rt = np.random.normal(mus, scale=112.0, size=N*len(P)).tolist()
iv = np.concatenate([np.array([p]*N) for p in P]).tolist()
df = pd.DataFrame({'id': sub_id, 'rt': rt, 'iv':iv})
Do the repeated measures ANOVA.
aovrm = AnovaRM(df, depvar='rt', subject='id', within=['iv'])
fit = aovrm.fit()
fit.summary()
Plot simple line graph with sample data.
line_data = range(1,10)
plt.figure()
plt.title("Example Graph", size="xx-large") # can also feed font point size, like 36
plt.xlabel("X-Axis Label", size="x-large")
plt.ylabel("Y-Axis Label", size="x-large")
plt.xlim(0,10)
plt.ylim(0,10)
plt.plot(line_data, 'b*-', markersize=10, linewidth=3, label='Sample Data') # b*- means blue star marker with line
plt.tick_params(axis="both", which="major", labelsize=14)
plt.legend(loc=(0.25, 0.75), scatterpoints=1)
plt.show()
Plot Anscombe's quartet.
import seaborn as sns
sns.set(style="ticks")
# Load the example dataset for Anscombe's quartet
anscombe = sns.load_dataset("anscombe")
# Show the results of a linear regression within each dataset
# Semi-colon suppresses the non-graph output
ax = sns.lmplot(x="x", y="y", col="dataset", hue="dataset", data=anscombe,
col_wrap=2, ci=None, palette="muted", size=4,
scatter_kws={"s": 50, "alpha": 1});
# Change axis labels
ax.set(xlabel='X', ylabel='Y');
Naturally, this defaults to showing a 95% confidence interval.
ax = sns.barplot(x="day", y="total_bill", data=data, capsize=0.1)
Plot violin plot with overlaid beeswarm plot.
fig, ax = plt.subplots()
# Output to the size of A4 paper
fig.set_size_inches(11.7, 8.27)
# Overlay a swarmplot on top of a violinplot
ax = sns.violinplot(x="day", y="total_bill", data=data, inner=None)
ax = sns.swarmplot(x="day", y="total_bill", data=data, color="white")
def set_titles(thisPlot, titleList, fontSize):
for ax, title in zip(thisPlot.axes.flat, titleList):
ax.set_title(title, fontsize=fontSize)
def set_labels(thisPlot, xLabel, yLabel, fontSize):
thisPlot.set_xlabels(xLabel, fontsize=fontSize)
thisPlot.set_ylabels(yLabel, fontsize=fontSize)
def set_xtick_labels(thisPlot, tickList, fontSize):
thisPlot.set_xticklabels(tickList, fontsize=fontSize)
def set_legend(thisPlot, legendEntries, fontSize):
# find where last graph is so we can put the legend there
maxIndex = max(thisPlot.axes.shape) - 1
# format the legend, placing it outside the axes
thisPlot.axes[0][maxIndex].legend(bbox_to_anchor=(1.05, 1), loc=2,
fontsize=fontSize, borderaxespad=0.)
legend = thisPlot.axes[0][maxIndex].get_legend()
labels = legend.get_texts()
for i, thisLabel in enumerate(labels):
labels[i].set_text(legendEntries[i])
# Make plots -- many of these arguments are optional
barPlot = sns.factorplot(x="day", y="total_bill", hue="sex",
col="time", kind="bar", data=data,
size=5, aspect=1, legend=False)
beeswarmPlot = sns.factorplot(x="day", y="total_bill", hue="sex",
col="time", kind="swarm", dodge=True,
data=data, size=5, aspect=1, legend=False)
# Format them nicely!
# Axis labels
xLabel = ""# "Day"
yLabel = "Total Bill"
set_labels(barPlot, xLabel, yLabel, 20)
set_labels(beeswarmPlot, xLabel, yLabel, 20)
# Titles
title_list = ["Lunch", "Dinner"]
titles = [x.title() for x in title_list] # ["Bimodal", "Normal", "Skewed"]
set_titles(barPlot, titles, 30)
set_titles(beeswarmPlot, titles, 30)
# X axis tick labels or category labels
x_tick_labels = ["Thursday", "Friday", "Saturday", "Sunday"]
set_xtick_labels(barPlot, x_tick_labels, 15)
set_xtick_labels(beeswarmPlot, x_tick_labels, 15)
# Change legends
legendEntries = ["Male", "Female"]
set_legend(barPlot, legendEntries, 15)
set_legend(beeswarmPlot, legendEntries, 15)
# Save plots
# barPlot.savefig("barPlot.svg") # can also use other extensions, like .png
# beeswarmPlot.savefig("beePlot.svg")
Made using bokeh. See here for a great tutorial, and here for the attendant notebook. Code below adapted from linked code to our current dataset.
from bokeh.plotting import figure, output_notebook, show
this_plot= figure(width=600, height=600)
this_plot.circle(x=data['total_bill'], y=data['tip'], size=10, alpha=0.7)
output_notebook() # to output inline
show(this_plot)
Make better, more interactive plot. Let's plot a scatterplot of tip amount vs. total bill, separately for men and women.
from bokeh.plotting import figure, output_notebook, show, ColumnDataSource
import bokeh.models.tools as tools
# Get relevant subsets of data
male_data = data[data['sex'] == 'Male']
female_data = data[data['sex'] == 'Female']
# Convert to format bokeh understands
source_male = ColumnDataSource(male_data)
source_female = ColumnDataSource(female_data)
# Set up figure
this_plot = figure(width=600, height=600)
this_plot.circle(source=source_male, x='total_bill', y='tip', color='teal',
size=10, alpha=0.7, legend='Men')
this_plot.circle(source=source_female, x='total_bill', y='tip', color='darkorange',
size=10, alpha=0.7, legend='Women')
# Set axis labels
this_plot.xaxis.axis_label = "Total Bill"
this_plot.yaxis.axis_label = "Tip Amount"
# Show information when hovering the mouse over datapoints
this_plot.add_tools(tools.HoverTool(tooltips=[('Day', '@day')])) # use @ to choose feature from dataset
# Hide all circles of a given category when clicked in legend
this_plot.legend.click_policy = 'hide'
output_notebook()
show(this_plot)
import holoviews as hv
hv.extension('bokeh', 'matplotlib')
ds = hv.Dataset(data, kdims=["sex", "smoker", "total_bill"],
vdims=["time", "size", "day", "tip"])
%%output backend='bokeh'
%%output size=200
%%opts Scatter [tools=['hover']] (size=8 alpha=0.5)
kdims=["tip"]
vdims=["total_bill", "day", "time", "size"] # include "smoker" if you don't want it as drop-down choice
# Scatter plot with hover tool that includes all the things
scatter = ds.to(hv.Scatter, kdims, vdims).overlay('sex')
scatter
from pivottablejs import pivot_ui
pivot_ui(data)
import matplotlib.pyplot as plt
from ipywidgets import *
from numpy import pi, arange, sin
t = arange(0, 1.0, 0.01)
def pltsin(f):
plt.plot(t, sin(2*pi*t*f))
plt.show()
interact(pltsin, f=(1,10,0.1))
Plotly is another package for producing really nice and interactive graphs, but it requires signing up for an account to initialize it. After initialization you can use it online by default (which means all of your graphs get saved to the cloud for everyone to see forever) or you can use it offline (as demoed below). Examples taken or modified from here.
import plotly
# plotly.tools.set_credentials_file(username='XXX', api_key='XXX') # initialize with your credentials -- only need to do once ever.
from plotly.graph_objs import Scatter, Layout
plotly.offline.init_notebook_mode(connected=True)
plotly.offline.iplot({
"data": [Scatter(x=[1, 2, 3, 4], y=[4, 3, 2, 1])],
"layout": Layout(title="hello world")
})
When I first tried using plotly I sometimes got "IOPub data rate exceeded" errors. Here's how you fix that:
jupyter notebook --generate-config to generate a clean configuration file with all parameters commented outc.NotebookApp.iopub_data_rate_limit and c.NotebookApp.iopub_msg_rate_limit to be some absurdly large numbersimport plotly.offline as py
import plotly.figure_factory as ff
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/school_earnings.csv")
table = ff.create_table(df)
py.iplot(table, filename='plotly\table1')
import plotly.offline as py
from plotly.graph_objs import *
data = [Bar(x=df.School,
y=df.Gap)]
py.iplot(data)
trace_women = Bar(x=df.School,
y=df.Women,
name='Women',
marker=dict(color='#ffcdd2'))
trace_men = Bar(x=df.School,
y=df.Men,
name='Men',
marker=dict(color='#A2D5F2'))
trace_gap = Bar(x=df.School,
y=df.Gap,
name='Gap',
marker=dict(color='#59606D'))
data = [trace_women, trace_men, trace_gap]
layout = Layout(title="Average Earnings for Graduates",
xaxis=dict(title='School'),
yaxis=dict(title='Salary (in thousands)'))
fig = Figure(data=data, layout=layout)
py.iplot(fig)
data = [dict(
visible = False,
line=dict(color='00CED1', width=6),
name = '𝜈 = '+str(step),
x = np.arange(0,10,0.01),
y = np.sin(step*np.arange(0,10,0.01))) for step in np.arange(0,5,0.1)]
data[10]['visible'] = True
steps = []
for i in range(len(data)):
step = dict(
method = 'restyle',
args = ['visible', [False] * len(data)],
)
step['args'][1][i] = True # Toggle i'th trace to "visible"
steps.append(step)
sliders = [dict(
active = 10,
currentvalue = {"prefix": "Frequency: "},
pad = {"t": 50},
steps = steps
)]
layout = dict(sliders=sliders)
fig = dict(data=data, layout=layout)
py.iplot(fig)
s = np.linspace(0, 2 * np.pi, 240)
t = np.linspace(0, np.pi, 240)
tGrid, sGrid = np.meshgrid(s, t)
r = 2 + np.sin(7 * sGrid + 5 * tGrid) # r = 2 + sin(7s+5t)
x = r * np.cos(sGrid) * np.sin(tGrid) # x = r*cos(s)*sin(t)
y = r * np.sin(sGrid) * np.sin(tGrid) # y = r*sin(s)*sin(t)
z = r * np.cos(tGrid) # z = r*cos(t)
surface = Surface(x=x, y=y, z=z)
data = Data([surface])
layout = Layout(
title='Parametric Plot',
scene=Scene(
xaxis=XAxis(
gridcolor='rgb(255, 255, 255)',
zerolinecolor='rgb(255, 255, 255)',
showbackground=True,
backgroundcolor='rgb(230, 230,230)'
),
yaxis=YAxis(
gridcolor='rgb(255, 255, 255)',
zerolinecolor='rgb(255, 255, 255)',
showbackground=True,
backgroundcolor='rgb(230, 230,230)'
),
zaxis=ZAxis(
gridcolor='rgb(255, 255, 255)',
zerolinecolor='rgb(255, 255, 255)',
showbackground=True,
backgroundcolor='rgb(230, 230,230)'
)
)
)
fig = Figure(data=data, layout=layout)
py.iplot(fig)
You can generate these with the wes Python package.
That said, installation can be a little annoying, since you will often get an error for missing the colors.json file. If you get that error, simply download the tarball of the latest version of the package, extract colors.json and place it in the appropriate location (i.e., where the error tells you it cannot be found).
import wes
wes.available(show=True)
And set the palette with the following code:
wes.set_palette('Darjeeling')
for i in range(10):
plt.plot(range(100), np.random.normal(i, 1, 100))
Use set_trace() where you want the debugger to start.
'n' moves onto the next line
'c' continues execution of the script
from IPython.core.debugger import set_trace
def increment_value(a):
a += 1
set_trace()
print(a)
increment_value(3)
import inspect
import numpy as np
print(inspect.getsource(np))
inspect.getfile(np)
If you want to start digging deeper into Python, you can learn some cool things here, and here, and here.
That said, here is my favorite random snippet of python code ever. You can swap variable values without needing any temporary variables via tuple unpacking.
a = "A"
b = "B"
# Swap!
a, b = b, a
print("a = " + a)
print("b = " + b)
And extended unpacking is interesting to wrap your head around (Python 3 only).
a, *b, c = [1, 2, 3, 4, 5, 6]
print(a)
print(b)
print(c)
List comprehensions are also extremely useful, allowing you to program almost as if you were writing a sentence in English.
# get sum of squares of numbers taken from the range 1 to 10
sum(i**2 for i in range(11))
Zipping lists is another one of my favorite features.
a = ['a', 'b', 'c']
b = [1, 2, 3]
c = zip(a, b)
print(list(c)) # need to cast into a list because a zip object is a generator
Note that this requires running from a Python 3 instance of Jupyter (in my case, at least).
In theory, you should just be able to run this line and be all set, but it didn't work for me: conda install -c r r-essentials
If that didn't work, go through these steps:
install.packages('devtools')
devtools::install_github('IRkernel/IRkernel')
IRkernel::installspec() # to register the kernel in the current R installation
install.packages('ggplot2', dependencies=TRUE)pip install rpy2 from your command line/terminalpip install rpy2‑2.8.6‑cp36‑cp36m‑win_amd64.whl or whatever your .whl file is called from within the directory that has the file.First, make some example data in Python.
import pandas as pd
df = pd.DataFrame({'Letter': ['a', 'a', 'a', 'b','b', 'b', 'c', 'c','c'],
'X': [4, 3, 5, 2, 1, 7, 7, 5, 9],
'Y': [0, 4, 3, 6, 7, 10, 11, 9, 13],
'Z': [1, 2, 3, 1, 2, 3, 1, 2, 3]})
Load extension allowing one to run R code from within a Python notebook.
%load_ext rpy2.ipython
Do stuff in R with cell or line magics. "-i" imports to R, "-o" outputs from R back to Python.
%%R
install.packages("ggplot2", dep=TRUE)
install.packages("tidyr", dep=TRUE)
install.packages("dplyr", dep=TRUE)
%%R -i df
library("ggplot2")
ggplot(data = df) + geom_point(aes(x = X, y = Y, color = Letter, size = Z))
pip install matlab_kernel
pip install pymatbridge
If you're getting a "zmq channel closed" error, open jupyter notebook from a different port when using MATLAB
jupyter notebook --port=8889
Load MATLAB extension for running MATLAB code within a Python notebook.
%load_ext pymatbridge
Let's try transposing an array from Python in MATLAB, then feeding it back into Python.
First, define an array.
a = [
[1, 2],
[3, 4],
[5, 6]
]
a
Now transpose it easily in MATLAB!
%%matlab -i a -o a
a = a'
Finally, check that Python has the correct value of a.
a
Here's an example of a MATLAB plot.
%%matlab
b = linspace(0.01,6*pi,100);
plot(sin(b))
grid on
hold on
plot(cos(b),'r')
Exit MATLAB when done.
%unload_ext pymatbridge
Note that Javascript executes as the notebook is opened, even if it's been exported as HTML!
%%javascript
console.log('hey!')